Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm

نویسندگان

  • Yi-Hong Tseng
  • Hsi-Jian Lee
چکیده

This paper presents a recognition-based character segmentation method for handwritten Chinese characters. Possible non-linear segmentation paths are initially located using a probabilistic Viterbi algorithm. Candidate segmentation paths are determined by verifying overlapping paths, between-character gaps, and adjacent-path distances. A segmentation graph is then constructed using candidate paths to represent nodes and two nodes with appropriate distances are connected by an arc. The cost in each arc is a function of character recognition distances, squareness of characters and internal gaps in characters. After the shortest path is detected from the segmentation graph, the nodes in the path represent optimal segmentation paths. In addition, 125 text-line images are collected from seven form documents. Cumulatively, these text-lines contain 1132 handwritten Chinese characters. The average segmentation rate in our experiments is 95.58%. Moreover, the probabilistic Viterbi algorithm is modi®ed slightly to extract text-lines from document pages by obtaining non-linear paths while gaps between text-lines are not obvious. This algorithm can also be modi®ed to segment characters from printed text-line images by adjusting parameters used to represent costs of arcs in the segmentation graph. Ó 1999 Elsevier Science B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Viterbi Based Alignment between Text Images and their Transcripts

An alignment method based on the Viterbi algorithm is proposed to find mappings between word images of a given handwritten document and their respective (ASCII) words on its transcription. The approach takes advantage of the underlying segmentation made by Viterbi decoding in handwritten text recognition based on Hidden Markov Models (HMMs). Two HMMs modelling schemes are evaluated: one using 7...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

Handwritten Chinese character recognition using kernel active handwriting model

* 0-7803-7952-7/03/$17.00  2003 IEEE. Abstract This paper describes a kernel active handwriting model (K-AHM) and its application to handwritten Chinese character recognition. In the model, the kernel principal component analysis is applied to capture nonlinear variations caused by handwriting, and a fitness function on the basis of chamfer distance transform is introduced to search for the op...

متن کامل

Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents

The historical documents are valuable cultural heritages and sources for the study of history, social aspect and life at that time. The digitalization of historical documents aims to provide instant access to the archives for the researchers and the public, who had been endowed with limited chance due to maintenance reasons. However, most of these documents are not only written by hand in ancie...

متن کامل

Research of Chinese Handwritten Text Segmentation Algorithm

OCR is a complicated process, there are many factors that can influence the recognition rate. Early period people tried to optimize the classifier to obtain high recognition rate, but the premise is that there is only one character no matter print or handwritten. For the performance of classifier has been promoted a lot, recognition rate for single character is high enough for commercial use. W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 20  شماره 

صفحات  -

تاریخ انتشار 1999